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Abstract — Publishing social network data for research pur- 
poses has raised serious concerns for individual privacy. 
There exist many privacy-preserving works that can deal with 
different attack models. In this paper, we introduce a novel 
privacy attack model and refer it as a mutual friend attack. 
In this model, the adversary can re-identify a pair of friends 
by using their number of mutual friends. To address this 
issue, we propose a new anonymity concept, called fc-NMF 
anonymity, i.e., A-anonymity on the number of mutual friends, 
which ensures that there exist at least k-1 other friend pairs 
in the graph that share the same number of mutual friends. 
We devise algorithms to achieve the fc-NMF anonymity while 
preserving the original vertex set in the sense that we allow 
the occasional addition but no deletion of vertices. Further we 
give an algorithm to ensure the fc-degree anonymity in addition 
to the A-NMF anonymity. The experimental results on real- 
word datasets demonstrate that our approach can preserve 
the privacy and utility of social networks effectively against 
mutual friend attacks. 
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I. Introduction 

With the advance on mobile and Internet technology, 
more and more information is recorded by social network 
applications, such as Facebook and Twitter. The relationship 
information in social networks attracts researchers from 
different academic fields. As a consequence, more and more 
social network datasets were published for research purposes 
|H]. Th e published social network datasets may incur the 
privacy invasion of some individuals or groups. With the 
increasing concerns on the privacy, many works have been 
proposed for the privacy-preserving social network publica- 
tion 0, 0. 

Tai and Yu proposed the friendship attack model Q, 
which addressed the issue that an attacker can find out not 
only the degree of a person, but also the degree of his friend. 
It solves the attacks based on the degrees of two connected 
vertices. But it is not sufficient to just protect against the 
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Figure 1: Friend lists on Facebook 

friendship attack as there are more information available on 
the social network. For example, the graph in Fig. 2(a) is a 
fc 2 -degree anonymized graph with k = 2. If an attacker can 
obtain the number of mutual friends between two connected 
vertices, he still can identify (D, F) from other friend pairs, 
as only (D,F) has 2 mutual friends. This will be explained 
in more details later. In most social networking sites, such 
as Facebook, Twitter, and Linkedln, the adversary can easily 
get the number of mutual friends of two individuals linked 
by a relationship. As shown in Figure [T] one can directly 
see mutual friend list shared with one of his friends on 
Facebook. Usually, the adversary can get the friend lists 
of two individuals from Facebook, such as the friend list 
in Figure [T] and then get the number of mutual friends by 
intersecting their friend lists. 

In this paper, we introduce a new relationship attack 
model based on the number of mutual friends of two 
connected individuals, and refer it as a mutual friend attack. 
Figure [2] shows an example of the mutual friend attack. The 
original social network G with vertex identities is shown in 
Figure [2(b)] and can be naively anonymized as the network 
G' shown in Figure 2(c) by removing all individuals' names. 



The number on each edge in G' represents the number 
of mutual friends of the two end vertices. Alice and Bob 
are friends, and their mutual friends are Carl, Dell, Ed 
and Frank. So the number of mutual friends of Alice and 
Bob is 4. After obtaining this information, the adversary 
can uniquely re-identify the edge (D,E) is (Alice, Bob). 
Also, (Alice, Carl) can be uniquely re-identified in G'. By 
combining (Alice, Bob) and (Alice, Carl), the adversary 
can uniquely re-identify individuals Alice, Bob and Carl. 
This simple example illustrates that it is possible for the 
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Figure 2: Mutual friend attack in a social network 

adversary to re-identify an edge between two individuals 
and maybe indeed identify the individuals when he can get 
the number of mutual friends of individuals. Note that we 
do not consider the mutual friend number of two nodes if 
they are not connected. For convenience, we say the number 
of mutual friends of two nodes connected by an edge e as 
the number of mutual friends of e. 

In order to protect the privacy of relationship from the 
mutual friend attack, we introduce a new privacy-preserving 
model, fc-anonymity on the number of mutual friends (k- 
NMF Anonymity). For each edge e, there will be at least 
£-1 other edges with the same number of mutual friends as 
e. It can be guaranteed that the probability of an edge being 
identified is not greater than Ilk. We propose algorithms 
to achieve the fc-NMF anonymity for the original graph 
while preserving the original vertex set in the sense that we 
allow the occasional addition but no deletion of vertices. By 
preserving the original vertex set, various analysis on the 
anonymized graph, such as identifying vertices providing 
specific roles like centrality vertex, influential vertex, gate- 
way vertex, outlier vertex, etc., will be more meaningful. 
The experimental results on real datasets show that our ap- 
proaches can preserve much of the utility of social networks 
against mutual friend attacks. 

Related Work. Backstorm et al. [5 1 pointed out that simply 
removing identities of vertices cannot guarantee privacy. 
Many works have been done to prevent the vertex re- 
identification with the vertex degree. Liu et al. |6] studied 
the fc-degree anonymization which ensures that for any node 
v there exist at least k-1 other vertices in the published 
graph with the same degree as v. Tai et al. [4] introduced 
a friendship attack, in which the adversary uses the degrees 
of two end vertices of an edge to re-identify victims. 
Associated with community identity for each vertex, in 
|7| they proposed the fc-structural diversity anonymization, 
which guarantees the existence of at least k communities 
containing vertices with the same degree for each vertex. 
As these works only focus on the vertex degree, they cannot 
achieve the fc-NMF anonymity, which focuses on the number 
of common neighbors of two vertices. 

Many works have also been done to prevent the vertex re- 
identification based on the subgraph structural information. 
Zhou and Pei [8| proposed a solution to battle the adver- 
sary's 1 -neighborhood attacks. Cheng et al. [9| proposed 
the ^-isomorphism model, which disconnects the original 
graph into fc-isomorphic subgraph. To protect against mul- 
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Figure 3: Examples of the k-NMF anonymization 

tiple structural attacks, Zou et al. iflOl proposed the k- 
automorphism model, which converts the original network 
into a fc-automorphic network. But it does not prevent the 



mutual friend attack. The network in Figure 3(a) satisfies the 
2-automorphism, but the edge (3, 4) is not protected under 
the mutual friend attack. This is because the edge (3, 4) does 
not have mutual friends while all the others have one. Wu 
et al. ifTTIl proposed the fc-symmetry model, which gets a fc- 
automorphic network by orbit copying. All these algorithms 
need to introduce many new vertices and adjust many edges 
to achieve their targets. Therefore, the utility of the original 
graph will be decreased too much. In any case, these works 
are aimed at different types of attack model from ours as 



illustrated in Figure 3(a) 



Hay et al. [ 12 1 proposed a generalizing method for anony- 
mizing a graph, which partitions the vertices and summarizes 
the graph at the partition level. Other works focus on the 
problem of link disclosure, which decides whether there 
exists a link between two individuals. It is different from 
the relationship re-identification introduced in this section. 

Challenges. As the fc-NMF anonymity model is more com- 
plicated than the fc-degree anonymity model, more chal- 
lenges need to be handled. First, adding or removing a 
different edge may affect a different number of edges on 
their mutual friends. In the fc-degree anonymity model, the 
adversary attacks using the degree of the vertex. Adding an 
edge only increase the degrees of the two end vertices of 
this edge. In the fc-NMF anonymity model, the adversary 
attacks using the number of mutual friends. Adding an edge 
can increase the numbers of mutual friends of many edges. 



In Figure 2(b) adding an edge between Dell and Frank will 
affect the NMFs of (Dell, Alice), (Frank, Alice), (Dell, Bob), 
(Frank, Bob), and (Dell, Frank). Second, we need to provide 
a criterion on choosing where to add or delete the edge while 
considering the utility of the graph. Since we aim to preserve 
the vertex set, we cannot add a vertex to connect an edge. 
In fact, we map the fc-NMF anonymization problem into an 
edge anonymization problem in contrast to the vertex anony- 
mization problem in the fc-degree anonymization. Edges are 
anonymized one by one. Adding or deleting an edge should 
not destroy the anonymization of the already anonymized 
edges. To anonymize an edge, we can get many candidate 
edge operations and need to choose the best one. Besides, 
we need to consider the impact of the newly added edges 
on the number of mutual friends. 

Contributions. Our contributions can be summarized as 



follows. (1) We introduce the fc-NMF problem and formulate 
it as an edge weight anonymization problem where the edge 
weight is the NMF of the two end vertices. (2) We explore 
the geometry property of the graph to devise effective 
anonymization algorithms while preserving the vertex set to 
achieve better utility. (3) For the edge addition, we use the 
breadth-first manner to preserve utility. We also introduce 
the maximum mutual friend criterion to break the tie on 
selecting candidate vertex to connect. (4) For the edge dele- 
tion, we explore the triangle linking property to delete edges 
between vertices already belonging to a triangle connection 
in the network to avoid repeated re-anonymization of edges. 
(5) We devise an algorithm which can anonymize the fc- 
NMF anonymized graph to simultaneously satisfy the fc- 
degree anonymity, while preserving the vertex set. (6) The 
empirical results on real datasets show that our algorithms 
perform well in anonymizing the real social networks. 

The rest of the paper is organized as follows. We define 
the problem and design algorithms to solve it in section 2 
and 3. We conduct the experiments on real data sets and 
conclude in Section 4 and 5. 

II. Problem definition 

In this paper, we model a social network as an undirected 
simple graph G(V,E), where V is a set of vertices repre- 
senting the individuals, and E C V x V is the set of edges 
representing the relationship of individuals. 

Definition 1. The NMF of an edge. For an edge e between 
two vertices v\ and v 2 in a graph G(V, E), i.e., V\, v 2 £ V, 
e £ E and e = (v 1 ,V2), the number of mutual friends of 
the edge e is the number of mutual friends of vi and v 2 . 

Let / be the number sequence of mutual friends 
for G, in which entries are sorted in descending order, 
i.e., fi > fs > ■•■ > fm- Let I be the list of edges 
corresponding to /, i.e., fi is the NMF of the edge li. For 
example, in Figure |4(c)| / = {2, 2, 2, 2, 1, 1, 1, 1}, and I = 
{(vi,v 3 ), (v 2 ,v 3 ), (v 3 ,v 4 ), (v 3 ,v 5 ), (v 3 ,v 4 ), (v 3 ,v 5 ), (v 1 ,v 2 ), 
(vi,v 4 ), (v 2 ,v 5 ), (v 4 ,v 5 )}. Similar to the power law 
distribution of the vertex degree lfT3l . the NMF also has the 
same property |14|. 

Property 1. Scale free distribution of NMFs OH. The 

NMFs of edges in the large social network often have a 
scale-free distribution, which means that the distribution 
follows a power law or at least asymptotically. 

Definition 2. Mutual friend attack. Given a social network 
G(V,E) and the anonymized network G'(V',E') for pub- 
lishing. For an edge e € E, the adversary can get the number 
/ e of mutual friends of e. Mutual Friend Attack will identify 
all candidate edges e! £ E' with the number f e > of mutual 
friends as f e . 

Suppose that the candidate edge set of an edge e is E' e = 
{e'\e' £ E' 7 f e f = f e }. An adversary re-identifies the edge 




(a) 3-NMF (b) 6-NMF (c) 4-NMF 

Figure 4: Examples of fc-NMF anonymous graph 
e with high confidence if the number of candidate edges is 
too small. Hence, we set a threshold fc to make sure that for 
each edge e £ E, the number of candidate edges is no less 
than k, i.e., \E'j \ > k. We define the fc-anonymous sequence 
before defining the fc-NMF anonymous graph. 

Definition 3. k-anonymous sequence^. A sequence vector 
/ is fc-anonymous, if for any entry with value as v, there 
exist at least k — 1 other entries with value as v. 

Definition 4. fc-NMF. A graph G'(V, E') is fc-NMF anony- 
mous if the number sequence /' of mutual friends of edges 
in G' is a fc-anonymous sequence. 

Definition [4] states that for each edge e £ E, the number 
of candidate edges in G' is no less than fc. Consider the 
graphs in Figure [4] as an example. There are three edges in 
Figure 4(a) and the NMFs of all these edges are equal to 1 . 
Hence, this graph is a 3-NMF anonymous graph. As the six 



edges in the graph of Figure 4(b) have 2 mutual friends, this 
graph is a 6-NMF anonymous graph. The graph in Figure 
4(c)| has four edges (v%, v 3 ), (v 2 , v 3 ), (v 3 , v 4 ), (v 3 , v 5 ) with 



the NMF as 2, and the NMFs of other four edges are equal 
to 1 . Hence, this graph is a 4-NMF anonymous graph. Some 
properties on the number of mutual friends in the graph are 
described as follows. 

Proposition 1. Given a graph G(V, E), the number of 
mutual friends of an edge e £ E is equal to the number 
of triangles containing e in G . 

Take the graph in Figure |4(c)| as an example. The mutual 
friends of vertices v 2 and v 3 are v\ and i>5, so the number 
of mutual friends of the edge e = (v 2 ,v 3 ) is 2. It is equal 
to the number of triangles containing e. These triangles are 
[vi,V2,v 3 ) and (v 2 ,v 3 ,v 5 ). 

Proposition 2. Let G(V, E) be a graph and f be the number 
sequence of mutual friends of edges in G, where \E\ = m. 
Then Xa=i fi = 3«a> where n A is the number of triangles 
in G and fi is the number of mutual friends of the i-th edge. 

Different from the degree sequence in previous work |6|, 
which can maintain the number of entries in the sequence, 
the number sequence of mutual friends will have more en- 
tries added into it when new edges are added into the graph. 
Besides, according to Propositions [T] and |2j the number of 
mutual friends is related to the number of triangles in the 
graph. Therefore, adding one edge will affect the NMF of 
many edges, and adding a different edge may affect the NMF 
of a different number of edges. This can be illustrated by an 



example shown in Figure 3(b) After we add the edge (1,2), 



the NMFs of all ten edges increase by one. If we add the 
edge (4, 7), only the NMFs of edges (1, 4), (1, 7), (2, 4), and 



(2, 7) increase by one. Therefore, one cannot anonymize a 
graph by simply minimizing the number of changed edges. 

Anonymized Triangle Preservation Principle (ATPP). In 

our algorithms, we anonymize the edges in the graph one 
by one. An anonymized triangle is a triangle with some 
edges already anonymized in the process of the graph 
anonymizing. The Anonymized Triangle Preservation Prin- 
ciple aims to preserve the anonymized triangles containing 
already anonymized edges. It means that we neither create 
some additional anonymized triangles via edge addition nor 
destroy any via edge deletion. 

Creating (destroying) a triangle containing an already 
anonymized edge by edge addition (deletion) will increase 
(decrease) the NMF of this edge, indeed destroy the ano- 
nymization of this edge. This leads to repeatedly anony- 
mization of this edge. By preserving the anonymized trian- 
gles, we can avoid this problem during the anonymization 
process. 

Definition 5. &-NMF anonymization problem. Given a 
graph G(V, E) and an integer fc, the problem is to anonymize 
the graph G to a fc-NMF anonymous graph G' with edge 
addition and deletion, such that the vertex set of the original 
graph G is preserved. 

III. fc-NMF ANONYMIZATION APPROACH 

In the above section, we found that changing one edge 
may affect the NMFs of other edges. To handle this chal- 
lenge, we utilize the scala free distribution property shown 
in Property [T] and introduce the principle of preserving the 
anonymized triangles. By exploring the geometry property 
of the graph, we devise two effective anonymization algo- 
rithms to preserve the utility while satisfying the fc-NMF 
anonymity. 

A. Algorithm ADD 

In this subsection, we aim to anonymize the original graph 
only by edge addition. We organize edges into groups, and 
anonymize the edges in the same group to have the same 
NMF. The fc-anonymity requires there exist at least fc edges 
in a group. Property [T] states that the NMFs of edges in 
large social networks follow a scala free distribution. Hence, 
only a small number of edges have a high NMF. We first 
anonymize these edges, and many edges with low NMF do 
not need to be processed. 

Suppose the original graph is G(V, E) and the gradually 
anonymized graph is G' (V , E'). Initially, we sort the NMF 
sequence / in descending order and construct the corre- 
sponding edge list I as described in Section [II] We mark all 
edges as "unanonymized", and then anonymize the edges 
one by one. Iteratively, we start a new group GP with the 
group NMF, cjf, equal to the NMF of the first unanonymized 
edge in I. Then we select the edges with NMF equal to gf 
and mark them as "anonymized". We iteratively select the 



first unanonymized edge in I and anonymize it by adding 
edges to increase its NMF to gf. After anonymizing this 
edge, we mark it as "anonymized" and put it into GP. 
Adding new edges affects the NMF of some other edges, 
and these new edges will be added into / and I. Hence we 
resort the sequences / and I after each edge is anonymized. 
Algorithm 1 shows the detailed description of the ADD 
algorithm. Next, we consider when we start another new 
group. 

1) Group edges: An intuitive method, named Intuit- 
Group, starts another group when the number of edges 
in the group GP is equal to fc. Alternatively, to consider 
the anonymization cost, we propose a greedy method to 
decide when we start another group after \GP\ > fc, named 
GreedyGroup. Suppose that C / is the NMF sequence 
corresponding to the unanonymized edge list C I. No- 
tice that /W and are dynamically updated with / and I 
after anonymizing each edge. Similar to the consideration in 

0, after putting fc edges into GP, GreedyGroup iteratively 
checks whether it should merge the edge l[ u) into GP or 
start another group. The decision is made according to the 
following two costs based on the number of added mutual 
friends in Eq.([T]) and Eq.(|2]). 

C merge = { 9f -f[ u) )+I{fr\f ( £ 1 ) (D 

C new =I{f[ u \fi u) ) (2) 

where /(jf \ /j" >) = EUif } " fl% 

For Eq.|l}, we put l[ u ^ into GP. has f[ u ^ mutual 
friends, so we need to add g* — mutual friends for 

(u) 

anonymizing l\ . To satisfy fc-anonymity, we need to put 
at least fc edges into a new group GP'. Hence we put edges 
?2 , Ifcli into GP 1 . As we only adding edges, the group 
NMF of GP' is the maximum NMF among / 2 (tl) , fj£ v 

1. e., f%. To anonymize l^ u J, — /, mutual friends 
need to be added. For Eq.^, we put into a 
new group GP', and the group NMF of GP is /} . Hence 
Gmerge is the cost for anonymizing fc + 1 edges while C new 
is for fc edges. So if C merge is less than C new , we anonymize 

and merge it into GP, and check the next unanonymized 
edge. Otherwise we start another new group with l^. 

For each edge e, GreedyGroup looks ahead at 0(k) 
other edges to decide whether merging e with this group 
or starting a new group. Therefore, the time complexity of 
GreedyGroup is 0(k\E\). 

2) Cleanup-operation: In each iteration of the ADD 
algorithm, it checks the number of unanonymized edges, 
n u . If n u < 2k, the ramaining edges are put into a group; 
and if n u < fc, fc — n u edges needed to be added following 
the ATPP, so these fc edges can form a group. New vertices 
will be added into the graph if the ATPP cannot be satisfied. 

Next, we anonymize the edges E u in this group. Usu- 



Algorithm 1 The ADD Algorithm (GreedyGroup) 

Input: Original graph G(V.,E), k 

Output: fc-NMF anonymized graph G'{V',E') 

Initialization: G' = G, and mark all edges as "unanonyrnized" . Compute 
and sort the sequences / and I. = /, = l,Gt = 0 
l: while lM /0do 

if |£( u )| < 2k then do cleanup- operation and break. 



GP = {e\e e i<"> and = /}"'}; g, = /{"'; G f = G f Ug f . 

Mark any e € GP as "anonymized 11 ; update and V- u \ 

while \GP\ < k or (|GP| > k and C mer9c < C n<m ) do 

Anoymize i< u) by BFSEA. GP = GPU 4"', update i<"> and /W. 

end while 
end while 
return G'(V',E'). 



ally, we set the group NMF as the largest NMF among 
unanonyrnized edges, denoted as gf. Then we sum the 
difference as sd — J^etE (df — /e)> wnere fe is the 
number of mutual friends of the edge e. If sd >= fc/2, 
then we add sd nodes and 2 • sd edges into the graph. That 
means that for each unanonyrnized edge e, we add gf — f e 
vertices and link them with the two end vertices of e. As 
all the newly added (2 • sd >— k) edges have only one 
mutual friend, they can form a new group. Then we mark 
the new edges as "anonymized" and achieve the task. If 
sd < fc/2, then we enlarge the group NMF gf = gf + 1, 
and repeat the above process. By the clean-up operation, we 
can successfully anonymize the original network at the last 
step of the anonymization process. 

B. BFS-based Edge Anonymizationf BFSEA) 

In this section, we consider how to anonymize an edge 
by edge addition while preserving the utility. There are 
three challenges to increase the NMF of an edge via adding 
edges. First, the added edge should not affect the NMF of 
already anonymized edges. Secondly, the added edge should 
minimize the effect on the utility of the graph. Thirdly, the 
NMF of the newly added edges should not disrupt the current 
anonymization process which is progressing in descending 
order of the NMF value. 

Before anonymizing an edge (u, v), the ADD algorithm 
has created some anonymized groups and got a set Gf 
containing the group NMFs of these groups. Let 3/ be the 
NMF of the current group GP, and we put gf into Gf. 
Anonymizing the edge (u, v) means that we need to increase 
the NMF of (u, v) to the current group NMF gf, i.e. we need 
to create some new triangles containing this edge. Then we 
try to find some candidate vertices and add new edges to 
create new triangles. Considering the utility of the graph, 
we find the candidate vertices based on the Breadth First 
Search (BFS). 

From the nodes u and v, BFS-based Edge Anonymization 
traverses the graph in a breadth-first manner. For the i- 
hop neighbors of u and v, represented by neigi(u) and 
neigi(v), edge anonymization finds the candidate vertices 
from neigi(u) U neigi(v) and iteratively link the best one 
with u or v to create a new triangle. We formulize the NMF 



of the edge (u,v) as nmf(u,v). 

1) Candidates generation: We search the candidate ver- 
tices for edge (u, v) in a BFS manner. In the i-hop neighbors 
of u and v, many vertices cannot be the candidate vertices 
as violating the ATPP. The vertices w need to satisfy the 
following conditions to be the candidates in the set CVi. 

a) w e neigi{u) U neigi(v). 

b) (w, u, v) 7^ A. 

c) Vx E {u,v} and z E V, if (w,x) £ E',(w,z) e 
E' and (x, z) £ E', then (x, z) and (w, z) must be 
unanonyrnized. 

Condition b) states that (u, v, w) is not a complete tri- 
angle, which needs to add edges to create a new triangle. 
This mainly focus on the case when i = 1, where w may 
links with both u and v. Condition c) follows the ATPP, 
which guarantees that there will be no effect on the already 
anonymized edges. 

2) Candidates selection: After getting all the candidate 
vertices satisfying the conditions, we can add new edges 
between u,v and w € CVi t° increase the NMF of (u,v). 
We iteratively select a vertex from CVi to increase the NMF 
of (u, v) until nmf(u,v) reaches 3/ or CVi is empty. If 
nmf(u,v) = gf, this edge is anonymized successfully. 

In each iteration, we need to select the best one which 
can preserve the most utility of the graph. Based on the link 
prediction theory fl5l . we select the candidate vertex w ma x 
which guarantees that nmf (w ma x, u) + nmf'(w max ,v) is 
maximum, where nmf'(w,x) is defined in Eq|3] 

s / 0 (x,w) € E' ... 

nmt (w,x) = < n / \ \, . (3) 
I nmj{w,x) otherwise 

Where x e {u,v}. This is referred to as the maximum 
mutual friend criterion for adding edges. The more mutual 
friends between the two vertices, the less impact the edge 
addition will have on the utility of the graph. 

The selection criteria described in the Eq(3] only can be 
used for the candidates in the 1-hop and 2-hop neighbors. 
For all the candidates w in the z-hop neighbors with i > 3, 
the NMF of (x, w) is 0. In this situation, we randomly select 
a candidate vertex w max from CVi. 

As we anonymize edges in descending order of NMF, we 
must consider the different situations on the NMF of the new 
edge (x,w max ). In the situation nmf(x,w max ) >= g f , if 
) is not equal to any g f E Gf, (x,w max ) 
cannot be added into the graph. This is because we cannot 
anonymize this edge in descending order anonymization. 
Otherwise, we add (x, w max ) and mark it as "anonymized". 
We put this edge into the group with NMF equal to gf. 
If nmf(x,w ma x) < 9f, add {x,w max ) and mark it as 
"unanonyrnized". 

3) Candidates dynamic removal: After a new triangle 
was created with the vertex w max g CVi, we need to 
consider the effect of this triangle on the other candidate 



vertices in CVi. To ensure the linking iioru with vertices 
in CVi follows the ATPP, some vertices will be dynamically 
removed from CVi. 

If w G CVi connected with the selected vertex w max 
and the edge (w, w max ) is anonymized, then we remove w 
from CVi. This is because adding either (w,u) or (w,v) 
creates a new triangle containing (w,w max ), and destroys 
the anonymization of (w, w max ). 

For any vertex w G CVi with (w,w max ) is unanony- 
mized, if (w max ,x) is anonymized and (w,x) £ E', then 
we remove w from CVi. This is because if we select this w 
as a new maximum vertex, we need to add (w,x) to create 
a triangle containing (u,v), meanwhile created a triangle 
containing (w max ,x). This destroyed the anonymization of 
the edge (w max ,x). 

4) Edge anonymization: From the nodes u and v, BFS- 
based Edge Anonymization traverses the graph in a breadth- 
first manner. The BFSEA iteratively generates a candidate 
set CVi from the i-hop neighbors of u and v, where i 
increases from 1 to oo. After getting the candidate set 
CVi, BFSEA iteratively selects the best one from CVi by 
candidates selection and creates a triangle to increase the 
NMF of (u,v), then updates the CVi by the candidates 
dynamic removal. These operations will break when the 
NMF of (u, v) reaches the current group NMF <?/ or no 
more candidate vertex can be found from the whole graph. 

If nmf(u,v) reaches the current gj, i.e. (u,v) is ano- 
nymized successfully, we mark it as "anonymized". If the 
BFSEA cannot successfully anonymize this edge, adding 
new vertices can achieve the task. Linking one new vertex 
with the end vertices of this edge can increase the NMF 
of this edge by 1. The newly added edges have only one 
mutual friend, and will be anonymized at the last step 
of the anonymization algorithm. The above scenario is a 
pathological case that rarely occurs as in our experiments, 
no new vertices were added in all cases. 

By the breadth-first manner, the BFSEA first link u or 
v with w from the 1-hop neighbors. Thus after (x, w) is 
added, the shortest path length (SPL) between x G {u,v} 
and w will only decrease to 1 from 2 with little effect to the 
utility. Then we gradually increase the value of i, and link 
u and v with w from the i-hop neighbors, which decreases 
the SPL between x and w from i to 1. Hence, we prefer 
the candidates from i-hop neighbors with smaller i value, 
i.e. breadth-first manner, which can have less effect to the 
utility of the graph. 

To get the neigi(u) and neigi(v) for every i, we ex- 
ecute the Breadth-First Search with the time complexity 
as 0(\V\ + \E\). When i = 1, we need to compute the 
neigi(u) n neigi(v) to ensure (w,u,v) ^ A stated in the 
candidates generation, and the time complexity is 0(| V|). 
When i < 2, to get the best candidate from CVi, we compute 



the nmf{w 1 x), x G {u,v}, with the time complexity as 
0(|F|). Hence, for each candidate set, the total running 
time of the NMF computation is 0(|U| 2 ). When i > 3, we 
randomly select a candidate from CVi to create a triangle, 
and the time complexity is 0(1). Hence, the time complexity 
of candidates selection is (3(|y| 2 ). Therefore, the time 
complexity of BFSEA is 0(\V\ 2 ). 

As there are 0(|i?|) edges need to be anonymized, the 
time complexity of the ADD algorithm is 0(|i?||U| 2 ). 

C. Algorithm ADD &DEL 

Usually, anonymization combining edge deletion with 
addition will remove or add fewer edges than only applying 
edge addition. Indeed, it can improve the utility of the anony- 
mized graph. Before introducing the ADD&DEL algorithm, 
we discuss the method on how to anonymize an edge by 
edge deletion. 

Edge-deletion. For an unanonymized edge (u, v), the algo- 
rithm finds any candidate edge (x,w), where x is u or v, 
which satisfies the following conditions. 

a) Both (u,w) and (v,w) exist and are unanonymized. 

b) For any vertex z linked with x and w, edges (x, z) 
and (w, z) are still unanonymized. 

c) If both (u,w) and (v,w) satisfy condition b), we 
choose the one with fewer mutual friends. 

Condition c) is the reverse of the maximum mutual friend 
criterion for adding edge. The fewer the mutual friends, the 
weaker the relationship. Hence dropping the edge has less 
impact to the utility. After (x, w) is deleted, the shortest path 
length between x and w will only increase to 2 from 1 with 
little effect to the utility. Condition a) and b) follows the 
anonymized triangle preservation principle to guarantee that 
there will be no effect on the already anonymized edges. 

For an unanonymized edge (u, v), edge-deletion initially 
finds all candidate edges satisfying the edge-deletion con- 
ditions, and then puts them into the set CE. During each 
iteration, the edge e min G CE with the least mutual friends 
will be removed from the graph and the set CE. The algo- 
rithm stops when the NMF of (u, v) reaches the group NMF 
gj or CE becomes an empty set. If CE is empty and the 
NMF of (u, v) is not equal to gf, the anonymization of (u, v) 
is unsuccessful; Otherwise, we successfully anonymize this 
edge and mark it as "anonymized". 

The edge-deletion is the reverse of the methods in ADD 
algorithms. The running time mainly costs on the computing 
of mutual friends, so the complexity of edge-deletion is 

o(\v\ 2 )- 

The ADD&DEL Algorithm. This algorithm is shown in Al- 
gorithm 2, which anonymizes the graph by edge addition and 
deletion. Similar to the ADD algorithm, ADD&DEL checks 
the number of unanonymized edges with NMF equal to the 
NMF of the first unanonymized edge in sorted sequence l^ w \ 
If there are more than k edges, we put them into this group 



Algorithm 2 The ADD&DEL Algorithm 

Input: Original graph G(V,E), k 

Output: fc-NMF anonymized graph G'(V',E') 

Initialization: G' = G, and mark all edges as "unanonyrmzed" . Compute 
and sort the sequences / and I. = /, = L Gf = 0 
1: while i<"> yt 0 do 

2: if |i( u )| < 2k then do cleanup- operation and break. 
3: EE = {e\e e jM and = /<"'}; 

4: if \EE\ > A:, then new group GP = -EP, and mark any e € GP as 

anonymized. = G f U , update and and continue. 
5: GP = 0, #f = 7'ound(mean(f[ u \ fj: )). Record all initial info. 
6: while /j"' > g< do 
7: Anonymize 2^ by edge-deletion. 

8: if anonymize failed, then roll back to initial info, and (// = (// + 1; 
else mark as anonymized and GP = GP U l[ ; update 
and 

9: end while 

10: Gf = GfU gf. 

11: while jGP| < fc do 

12: Auoymize i' u) by BFSEA. GP = GPU update i<"> and /<">. 

13: end while 

14: end while 

15: return G'(V',E'). 

and start another group. Otherwise, we need to anonymize 
edges to form this group. To gradually anonymize edges and 
create this group, we initially set the group NMF, gf, as the 
mean value of NMFs of the first k unanonymized edges. We 
record all initial information before anonymizing this group. 
For the unanonymized edge with NMF greater than gf, we 
use edge-deletion to anonymize it. If we cannot successfully 
anonymize this edge, we set gf = gf + 1 and roll back to all 
initial information. For the unanonymized edge with NMF 
less than gf, we apply the ADD algorithm to anonymize 
it. We gradually anonymize unanonymized edges in sorted 
sequence until this group has k edges, and start another 
group. 

In the ADD&DEL algorithm, an edge will be anonymized 
by either Edge-deletion or methods of the ADD algorithm. 
Therefore, the time complexity of anonymizing an edge 
is 0(|F| 2 ), and the time complexity of the ADD&DEL 
algorithm is 0{\E\\V\ 2 ). 

D. k\-degree Anonymization Based on k2-NMF Anony- 
mization 

In this subsection, we propose the KDA algorithm on 
anonymizing the /C2-NMF anonymized graph G to satisfy 
ki -degree anonymity. To maintain the fc 2 -NMF anonymity of 
G , the KDA algorithm does not change the NMF of edges 
in G when performing anonymization. Proposition [I] stated 
that the NMF of an edge is related on the number of triangles 
in which this edge participate, so we anonymize the graph G 
without adding new triangles, i.e., the anonymized triangle 
preservation principle. We can connect two vertices with 
shortest path length (SPL) no less than three to guarantee 
that no new triangles will be introduced. Then the NMF of 
newly added edge is zero. As the degree distribution of the 
social network follows the power law |[T3l . we only need to 
anonymize these vertices with large degrees. 



The KDA algorithm is similar to the ADD algorithm. 
The unanonymized vertices are sorted in descending order of 
their degrees. We gradually group and anonymize them only 
by edge addition. The vertices in the same group have same 
degree. To start a new group, KDA set the group degree 
gd as the greatest degree of unanonymized vertices. If there 
are less than k vertices in this group, we anonymize the 
unanonymized vertices in descending order of their degrees. 
If this group has more than k vertices, we compute the 
Gmerge and C new for the next unanonymized vertex, and 
decide whether put it into this group or start a new group. 

Suppose that the i-hop neighbors of vertex u is neigiiu). 
To anonymize the unanonymized vertex u, KDA iteratively 
and randomly select an unanonymized vertex w max from 
neigs(u) and connect u and w max - If the vertex u cannot 
be anonymized, KDA update the neig^(u) based on the 
newly added edges and repeat the above process. If u still 
cannot be anonymized, we select the candidate vertex from 
neigi(u),neigz{u) and so on until u is anonymized. 

When anonymizing a vertex, the KDA algorithm searches 
the graph in a breadth-first manner to get the candidate 
vertices. In the worst case, the KDA searches the whole 
graph and the time complexity is Od-El + |^|)- As there 
are 0(V) vertices needed to be anonymized, the time 
complexity of the KDA algorithm is 0(|U||F| + \ V\ 2 ) in 
the worst case. 

IV. Experimental Results 

In this section, we conduct experiments on real data 
sets to evaluate the performance of the proposed graph 
anonymization algorithms. 

A. Datasets 

We conduct our experiments on three real datasets: ACM, 
Cora, and Brightkite. All datasets are preprocessed into 
simple undirected graphs without self-loop and multiple 
edges. We also remove the isolated vertices from the graph. 

ACM: This dataset was extracted from ACM digital 
library. We extracted papers published in 12 conference 
proceedings on computer science before the year 2011. 
We derive a graph describing the citations between papers. 
If one paper cites another paper, an undirected edge will 
connect both corresponding vertices. The graph includes 
7,315 vertices and 16,203 edges. 

Cora: This dataset is composed of a number of scientific 
papers on computer science lfl6l . We extract the collabo- 
rations between authors to derive the graph. If two authors 
had co-authored some papers they would be connected. After 
we removed the authors without any collaboration, the graph 
contains 14,076 vertices and 72,871 edges. 

Brightkite: This dataset shows the friendships between 
users in the social network Brightkite over the period of 



Table I: The numbers of vertices violating fc-degree anony- 
mization and edges violating fc-NMF anonymization 
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April 2008 to October 2010. The graph consists of 58,228 
nodes and 214,078 edges, and is available at the SNAP (TJ. 

B. Mutual Friend Attack in Real Data 

In the fc-degree anonymization model, the adversary re- 
identifies a vertex using the degree of this vertex. In the 
fc-NMF anonymization model, the adversary re-identifies an 
edge using the NMF of this edge. We compare both attacks 
on the real datasets listed in Subsection |IV-A| and show the 
results in Table [I] We removed all labels in three datasets. 
From Table [I] we can see that the number of edges violating 
fc-NMF anonymity can be sizable when we set fc from 5 to 
100. It is a very easy way for an adversary to take the mutual 
friend attack. fc-NMF anonymization problem can be seen 
as a parallel of the fc-degree anonymization problem. 

C. Evaluating k-NMF Anonymization Algorithms 

We evaluate the performance of the Greedy and Intuitive 
ADD algorithms and the ADD&DEL algorithm by measur- 
ing the average clustering coefficient, average path length, 
betweenness centrality and the ratios of edges change. 
Figures 5-8 show the results, where ADD-Int and ADD-Gre 
stand for the ADD algorithm with IntuitGroup and Greedy- 
Group respectively. ADD&DEL stands for the ADD&DEL 
algorithm. 

Average Clustering Coefficient (CC): We first compare 
the average clustering coefficients of the anonymized graphs 
with the original graph, and the results are shown in Figure 
[5] The CC values on datasets ACM and Brightkite increase 
when fc increases, but decreased on dataset Cora when fc 
increases. Hence no clear trend on CC change can be con- 
cluded, however the average clustering coefficients derived 
by our three methods deviate slightly from the original 
values on three datasets. The ADD&DEL performs better 
than the two ADD algorithms in Figure [5] and the ADD 
algorithm with GreedyGroup looks slightly better than the 
algorithm with IntuitGroup. 

Average Path Length (APL): Figure [6] shows the average 
path lengths for the anonymized graphs and the original 
graphs on three datasets. The APL of the graph anonymized 
by the ADD&DEL algorithm is very close to the APL 
of the original graph. By adding and deleting edges, the 
ADD&DEL algorithm can preserve more utility than the 
ADD algorithm. Besides, the differences of APL between 



the graphs anonymized by our methods and the original 
graphs are very small, and the largest difference value is 
0.8 when fc is set as 100 on the dataset Cora. 
Betweenness Centrality (BC): All the plots of the average 
betweenness centralities are very similar to the plots of the 
APL. Hence we show the distribution of betweenness cen- 
tralities of all vertices in Figure [7] Due to space constraints, 
we only show the results on Cora. The sub-figures in Figures 



7(a) 7(b) and 7(c) enlarge the details on the frequency varied 
from 0 to 100. Clearly, in Figures [7(a)] and [7(b)] ADD&DEL 
performs better than the ADD algorithm with GreedyGroup, 
and shows little sensitivity to the value of fc while ADD 
with GreedyGroup degrades as fc increases. Also Figure 



7(c)| shows that ADD&DEL performs better than the ADD 
algorithms. 

Percentages of edges changed: As there is no vertex 
addition occurred in all cases considered under ADD and 
ADD&DEL which do not perform node deletion operations, 
we consider the edge changes. Figure [8] shows the edge 
changes on the original graphs. The changes on ADD&DEL 
includes the ratios of edges added and removed. The 
ADD&DEL algorithm changed fewest edges, and the ADD 
algorithm with GreedyGroup added fewer edges than the 
algorithm with IntuitGroup. 

From the above evaluation, we can see that our algorithms 
can preserve the utility of the original graph effectively. 
Among them, ADD&DEL performs better than the ADD al- 
gorithm, and GreedyGroup performs better than IntuitGroup. 

D. Evaluating the KDA Algorithm 

In this subsection, we evaluate the performance of the 
KDA algorithm in Section III-D , and compare it with the 



classic fc-degree anonymization algorithm in |6|. 

Since there are no new triangles formed after the KDA al- 
gorithm adds new edges, the clustering coefficient decreases 



a little bit as fc increases as shown in Figures 9(a) 10(a) 



and 11(a) Our algorithm performs better than the classic 
fc-degree anonymization on this measure. Since new edges 
are added into the graph, the APL value decreases a little bit 



as fc increases as shown in Figures 9(b) 10(b) and 11(b) 
As we consider the fc-NMF anonymity, the classic fc-degree 
anonymization performs a little better than our algorithm 
on the APL measure. But when the APL of the graph is 
large, our algorithm can perform better than the classic fc- 



degree anonymization as shown in Figure 9(b) The results 
show that our algorithm performs well on preserving the 
utility while protecting the privacy by carefully exploring the 
graph property. The classic fc-degree anonymization makes 
less effort on this except minimizing the number of edges 



added. Figures 9(c) 10(c) and 11(c) show the distributions 
of betweenness centrality of graphs anonymized by the 
KDA algorithm when we set kd eg as 10, 20 and 30. The 
distributions of the anonymized graphs are very similar 
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to the distributions of the original graphs especially for 
the ACM and Brightkite datasets. It shows that the KDA 
algorithm can preserve much of the utility of the graph 
anonymized by the fc-NMF algorithms. 

V. Conclusions 

In this paper, we have identified a new problem of k- 
anonymity on the number of mutual friends, which protects 
against the mutual friend attack in the social network pub- 
lication. To solve this problem, we designed two heuristic 
algorithms which consider the utility of the graph. We also 
devised an algorithm to ensure the fc-degree anonymity 
based on the fc-NMF anonymity. The experimental results 
demonstrate that our approaches can ensure the fc-NMF 
anonymity while preserve much of the utility in the original 
social networks. 
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Figure 9: k-degree anonymization on 20-NMF anonymized graph of ACM 
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Figure 10: k-degree anonymization on 25-NMF anonymized graph of Cora 
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Figure 11: k-degree anonymization on 25-NMF anonymized graph of Brightkite 
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